Prosper Loans Exploration by Paron Sarampakhul

## [1] 113937     81
## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6          :84984   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   C      : 5649   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   D      : 5153   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   B      : 4389   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   AA     : 3509   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   HR     : 3508   Max.   :60.00  
##  (Other)                      :113912   (Other): 6745                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576                      :58848  
##  Completed            :38074   2014-03-04 00:00:00:  105  
##  Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other)              : 1108   (Other)            :54633  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

The dataset contains 81 variables with 113,937 observations.

Univariate Plots Section

I see that from the dataset that there are 2 columns for credit grades,
pre-2009 and after July 2009, therefore I want to see the number of listings
over time. It turns out that there is a gap between late 2008 and mid 2009
where there are no listings. I wonder whether the gap has something to do
with the Financial Crisis in late 2008. There is an increasing trend for
number of listings from 2009 to 2013. This could be because the economy
started to recover.

I want to see if the distributions of credit ratings are the same for
pre-2009 listings and post-2009 listings. I have to re-arrange the ratings from
best to worse for the x-axis. Pre-2009 listings have high numbers of B-rated,
C-rated, and D-rated ratings. The number of AA-rated listings is also more than
the number of A-rated listings. Post-2009 listings have high numbers of
A-rated, B-rated, C-rated, and D-rated listings. The number of AA-rated
listings is significantly lower than the number of A-rated listings.

I want to see the status of the loans. I have to re-arrange the status of
the loans that are past due.

I want to see the distribution of ProsperScore, which is a custom risk score
built using historical Prosper data. From the provided data variables
definitions, the score should be from 1 to 10, with 10 being the best score.
I wonder why there are some listings with score equal to 11.

## 
##    12    36    60 
##  1614 87778 24545

There are only 3 available terms for loans: 12 months, 36 months and 60
months. Most listings are due in 36 months.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229      25
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

Both BorrowerAPR and Borrowerate have normal distribution. Mean Borrower APR is 21.88% and mean borrower rate is 19.28%. This makes sense because APR is a broader measure of the cost of a mortgage. It includes the interest rate (borrower rate) plus other costs such as broker fees, discount points and some closing costs (https://www.bankrate.com/finance/mortgages/apr-and-interest-rate.aspx).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.0100  0.1242  0.1730  0.1827  0.2400  0.4925
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -0.183   0.116   0.162   0.169   0.224   0.320   29084
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.005   0.042   0.072   0.080   0.112   0.366   29084
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -0.183   0.074   0.092   0.096   0.117   0.284   29084

Mean lender yield (interest rate less servicing fee) = 18.27%, mean
estimated effective yield = 16.9%, mean estimated loss = 8%, mean estimated
return = 9.6%. Lender yield, estimated effective yield and estimated loss
have normal distribution but with a small peak on the right side. It seems
that the higher returns are associated with higher risk of loss. The net
estimated return therefore has normal distribution without a small peak
on the right side.

Majority of the loan category is debt consolidation.

The histogram for occupation shows 2 significant peaks: Other and
Professional. This is because the term ‘Professional’ is a broard term,
and ‘Other’ is a blanket term for other occupations that may not be listed
in the survey.

The histogram for employment status shows that most borrowers are employed.
I wonder what are the differnece between ‘Employed’ and ‘Full-time’.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   26.00   67.00   96.07  137.00  755.00    7625

Histogram for employment duration shows that majority of the borrowers have
been employed less than 6 years. The median duration is 67 months.

The portion of borrowers who are homeowners and the portion of borrowers who are not homeowners are close.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##        0    38404    56000    67296    81900 21000035

The income range of borrowers are left-skewd normally distributed, with
mean stated annual income = $67,296 and median stated annual income = $56,000.

Since the lower range and upper range of credit score provided by a
consumer credit rating agency, I take the mid points and plot them. The
credit score of the borrowers are left-skewed normally distributed. I suspect
that one of the factors used to determine a credit score is the income.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    7.00   10.00   10.32   13.00   59.00    7604
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    6.00    9.00    9.26   12.00   54.00    7604
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    4.00    6.00    6.97    9.00   51.00

Number of current credit lines, number of open credit lines, and number of
open revolving accounts are right-skewed normally distributed. I wonder whether
these are correlated with the borrower’s credit score, the listing’s credit
grade, and listing’s estimated loss.

Univariate Analysis

What is the structure of your dataset?

The data set consists of 81 variables and 113,937 records. Each record contains the information of the loan listing, which include loan key, interest rate, estimated return rate, estimated loss rate, loan amount, loan category, credit rating, borrower information, etc.

What is/are the main feature(s) of interest in your dataset?

The main features in the dataset are borrower rate, estimated effective yield, estimated loss rate, estimated return rate, and credit rating of a loan. I would like to find out what features determine the mentioned values. I suspect that the main factors should include the annual income of borrower, employment status, credit score, number of open credit lines, number of open revolving accounts, current delinquencies, and amount delinquent.

I suspect that the high annual income and employed status will result in higher credit score. The small number of credit lines, revolving accounts, and delinquencies will also result in higher credit score. Higher credit score should result in lower borrower rate, lower estimated loss, and better credit rating of the listing.

What other features in the dataset do you think will help support your nvestigation into your feature(s) of interest?

Other factors that may determine the credit score include the size of the loan orignal amount, employment duration, TotalProsperPaymentsBilled, %OnTimeProsperPayments vs %LateProsperPayments, ProsperPrincipalOutstanding, and number of recommendations the borrower receives.

Did you create any new variables from existing variables in the dataset?

I only transformed the monthly stated income to annual by multiplying with 12 as I would like to see whether the monthly stated income tallies with the column IncomeRange in terms of their distributions.

I also find the mid point of borrower’s credit score by summing lower range and upper range, and then dividing by 2.

Of the features you investigated, were there any unusual distributions?

I log-transformed the right skewed of employment durations, revolving monthly payments, amount delinquent, and the size of orignal loan amount.

Bivariate Plots Section

##                             BorrowerRate EstimatedLoss EstimatedReturn
## BorrowerRate                 1.000000000    0.90718659     0.744026264
## EstimatedLoss                0.907186588    1.00000000     0.398417591
## EstimatedReturn              0.744026264    0.39841759     1.000000000
## ProsperRating..numeric.     -0.952319844   -0.94049755    -0.585215344
## ProsperScore                -0.754924462   -0.71339475    -0.511464942
## EmploymentStatusDuration    -0.006677846   -0.01104844    -0.001880237
## CreditScoreRangeLower       -0.677637574   -0.63777703    -0.460929838
## CreditScoreRangeUpper       -0.677637574   -0.63777703    -0.460929838
## OpenCreditLines             -0.014502203   -0.02081602    -0.010145916
## OpenRevolvingAccounts       -0.053994561   -0.06154182    -0.027623643
## OpenRevolvingMonthlyPayment -0.011158846   -0.02988925     0.016209249
## CurrentDelinquencies         0.192472078    0.20761201     0.091677962
## AmountDelinquent             0.058324323    0.05694464     0.035214686
## StatedMonthlyIncome         -0.175402514   -0.16310931    -0.129163303
## TotalProsperPaymentsBilled   0.015204584    0.03754666    -0.026208202
## OnTimeProsperPayments        0.002647098    0.02742060    -0.037520478
## LoanOriginalAmount          -0.332415965   -0.37341817    -0.140509818
## Recommendations             -0.007944508    0.01546091    -0.033922843
##                             ProsperRating..numeric. ProsperScore
## BorrowerRate                           -0.952319844 -0.754924462
## EstimatedLoss                          -0.940497550 -0.713394750
## EstimatedReturn                        -0.585215344 -0.511464942
## ProsperRating..numeric.                 1.000000000  0.757705549
## ProsperScore                            0.757705549  1.000000000
## EmploymentStatusDuration                0.009967765 -0.021738498
## CreditScoreRangeLower                   0.693808904  0.468153263
## CreditScoreRangeUpper                   0.693808904  0.468153263
## OpenCreditLines                         0.019472625 -0.003469519
## OpenRevolvingAccounts                   0.059926376  0.051447785
## OpenRevolvingMonthlyPayment             0.018444291  0.004127153
## CurrentDelinquencies                   -0.191535615 -0.180545589
## AmountDelinquent                       -0.058112148 -0.069222221
## StatedMonthlyIncome                     0.182653529  0.149059321
## TotalProsperPaymentsBilled             -0.028280944  0.048789718
## OnTimeProsperPayments                  -0.016693974  0.060967879
## LoanOriginalAmount                      0.378456072  0.265941623
## Recommendations                        -0.001580269  0.028654751
##                             EmploymentStatusDuration CreditScoreRangeLower
## BorrowerRate                            -0.006677846           -0.67763757
## EstimatedLoss                           -0.011048442           -0.63777703
## EstimatedReturn                         -0.001880237           -0.46092984
## ProsperRating..numeric.                  0.009967765            0.69380890
## ProsperScore                            -0.021738498            0.46815326
## EmploymentStatusDuration                 1.000000000            0.02803261
## CreditScoreRangeLower                    0.028032614            1.00000000
## CreditScoreRangeUpper                    0.028032614            1.00000000
## OpenCreditLines                          0.101591066            0.04730214
## OpenRevolvingAccounts                    0.109458477            0.06219749
## OpenRevolvingMonthlyPayment              0.130917140            0.01548968
## CurrentDelinquencies                     0.040869571           -0.20293154
## AmountDelinquent                         0.018650875           -0.05950110
## StatedMonthlyIncome                      0.088275588            0.14136536
## TotalProsperPaymentsBilled               0.052261809           -0.06595384
## OnTimeProsperPayments                    0.053694338           -0.05340699
## LoanOriginalAmount                       0.021701617            0.29649322
## Recommendations                         -0.010011644            0.01240691
##                             CreditScoreRangeUpper OpenCreditLines
## BorrowerRate                          -0.67763757    -0.014502203
## EstimatedLoss                         -0.63777703    -0.020816018
## EstimatedReturn                       -0.46092984    -0.010145916
## ProsperRating..numeric.                0.69380890     0.019472625
## ProsperScore                           0.46815326    -0.003469519
## EmploymentStatusDuration               0.02803261     0.101591066
## CreditScoreRangeLower                  1.00000000     0.047302142
## CreditScoreRangeUpper                  1.00000000     0.047302142
## OpenCreditLines                        0.04730214     1.000000000
## OpenRevolvingAccounts                  0.06219749     0.874976944
## OpenRevolvingMonthlyPayment            0.01548968     0.549298914
## CurrentDelinquencies                  -0.20293154    -0.133749451
## AmountDelinquent                      -0.05950110    -0.071660530
## StatedMonthlyIncome                    0.14136536     0.245719475
## TotalProsperPaymentsBilled            -0.06595384     0.054588933
## OnTimeProsperPayments                 -0.05340699     0.062047323
## LoanOriginalAmount                     0.29649322     0.138424850
## Recommendations                        0.01240691     0.003112712
##                             OpenRevolvingAccounts
## BorrowerRate                          -0.05399456
## EstimatedLoss                         -0.06154182
## EstimatedReturn                       -0.02762364
## ProsperRating..numeric.                0.05992638
## ProsperScore                           0.05144779
## EmploymentStatusDuration               0.10945848
## CreditScoreRangeLower                  0.06219749
## CreditScoreRangeUpper                  0.06219749
## OpenCreditLines                        0.87497694
## OpenRevolvingAccounts                  1.00000000
## OpenRevolvingMonthlyPayment            0.56062965
## CurrentDelinquencies                  -0.13089589
## AmountDelinquent                      -0.06508789
## StatedMonthlyIncome                    0.16756101
## TotalProsperPaymentsBilled             0.01734922
## OnTimeProsperPayments                  0.02591960
## LoanOriginalAmount                     0.13168860
## Recommendations                        0.01467922
##                             OpenRevolvingMonthlyPayment
## BorrowerRate                               -0.011158846
## EstimatedLoss                              -0.029889251
## EstimatedReturn                             0.016209249
## ProsperRating..numeric.                     0.018444291
## ProsperScore                                0.004127153
## EmploymentStatusDuration                    0.130917140
## CreditScoreRangeLower                       0.015489683
## CreditScoreRangeUpper                       0.015489683
## OpenCreditLines                             0.549298914
## OpenRevolvingAccounts                       0.560629653
## OpenRevolvingMonthlyPayment                 1.000000000
## CurrentDelinquencies                       -0.120141931
## AmountDelinquent                           -0.048446594
## StatedMonthlyIncome                         0.351034261
## TotalProsperPaymentsBilled                  0.011718293
## OnTimeProsperPayments                       0.018903209
## LoanOriginalAmount                          0.158422777
## Recommendations                            -0.027648115
##                             CurrentDelinquencies AmountDelinquent
## BorrowerRate                          0.19247208       0.05832432
## EstimatedLoss                         0.20761201       0.05694464
## EstimatedReturn                       0.09167796       0.03521469
## ProsperRating..numeric.              -0.19153561      -0.05811215
## ProsperScore                         -0.18054559      -0.06922222
## EmploymentStatusDuration              0.04086957       0.01865088
## CreditScoreRangeLower                -0.20293154      -0.05950110
## CreditScoreRangeUpper                -0.20293154      -0.05950110
## OpenCreditLines                      -0.13374945      -0.07166053
## OpenRevolvingAccounts                -0.13089589      -0.06508789
## OpenRevolvingMonthlyPayment          -0.12014193      -0.04844659
## CurrentDelinquencies                  1.00000000       0.42922662
## AmountDelinquent                      0.42922662       1.00000000
## StatedMonthlyIncome                  -0.02356340       0.02384595
## TotalProsperPaymentsBilled            0.05202131       0.02591850
## OnTimeProsperPayments                 0.04572720       0.02174984
## LoanOriginalAmount                   -0.12062921      -0.02479796
## Recommendations                       0.01086195       0.01361218
##                             StatedMonthlyIncome TotalProsperPaymentsBilled
## BorrowerRate                       -0.175402514                0.015204584
## EstimatedLoss                      -0.163109308                0.037546657
## EstimatedReturn                    -0.129163303               -0.026208202
## ProsperRating..numeric.             0.182653529               -0.028280944
## ProsperScore                        0.149059321                0.048789718
## EmploymentStatusDuration            0.088275588                0.052261809
## CreditScoreRangeLower               0.141365364               -0.065953838
## CreditScoreRangeUpper               0.141365364               -0.065953838
## OpenCreditLines                     0.245719475                0.054588933
## OpenRevolvingAccounts               0.167561010                0.017349217
## OpenRevolvingMonthlyPayment         0.351034261                0.011718293
## CurrentDelinquencies               -0.023563403                0.052021306
## AmountDelinquent                    0.023845950                0.025918499
## StatedMonthlyIncome                 1.000000000               -0.003911465
## TotalProsperPaymentsBilled         -0.003911465                1.000000000
## OnTimeProsperPayments              -0.004680939                0.989833952
## LoanOriginalAmount                  0.319067492                0.008807879
## Recommendations                    -0.008996549                0.108737733
##                             OnTimeProsperPayments LoanOriginalAmount
## BorrowerRate                          0.002647098       -0.332415965
## EstimatedLoss                         0.027420598       -0.373418167
## EstimatedReturn                      -0.037520478       -0.140509818
## ProsperRating..numeric.              -0.016693974        0.378456072
## ProsperScore                          0.060967879        0.265941623
## EmploymentStatusDuration              0.053694338        0.021701617
## CreditScoreRangeLower                -0.053406989        0.296493221
## CreditScoreRangeUpper                -0.053406989        0.296493221
## OpenCreditLines                       0.062047323        0.138424850
## OpenRevolvingAccounts                 0.025919597        0.131688600
## OpenRevolvingMonthlyPayment           0.018903209        0.158422777
## CurrentDelinquencies                  0.045727195       -0.120629209
## AmountDelinquent                      0.021749841       -0.024797957
## StatedMonthlyIncome                  -0.004680939        0.319067492
## TotalProsperPaymentsBilled            0.989833952        0.008807879
## OnTimeProsperPayments                 1.000000000        0.013513340
## LoanOriginalAmount                    0.013513340        1.000000000
## Recommendations                       0.109053878       -0.016158229
##                             Recommendations
## BorrowerRate                   -0.007944508
## EstimatedLoss                   0.015460914
## EstimatedReturn                -0.033922843
## ProsperRating..numeric.        -0.001580269
## ProsperScore                    0.028654751
## EmploymentStatusDuration       -0.010011644
## CreditScoreRangeLower           0.012406915
## CreditScoreRangeUpper           0.012406915
## OpenCreditLines                 0.003112712
## OpenRevolvingAccounts           0.014679223
## OpenRevolvingMonthlyPayment    -0.027648115
## CurrentDelinquencies            0.010861954
## AmountDelinquent                0.013612180
## StatedMonthlyIncome            -0.008996549
## TotalProsperPaymentsBilled      0.108737733
## OnTimeProsperPayments           0.109053878
## LoanOriginalAmount             -0.016158229
## Recommendations                 1.000000000

There is a high correlation between BorrowerRate and Estimated Loss.
This makes sense since investors should require higher rates for riskier assets.
As expected, the BorrowerRate is negatively correlated with ProsperScore,
ProsperRating and CreditRating. I wonder what factors are used to determine
the interest rate for each listing.

From the scatterplot, it is not very clear whether StatedMonthlyIncome is
correlated with the BorroweRate. Let’s try a boxplot

##     item         group1 vars     n      mean         sd  median   trimmed
## X17    7  Not displayed    1  7741 0.1891813 0.06917048 0.18800 0.1896773
## X18    8   Not employed    1   806 0.2467031 0.07621766 0.25995 0.2543969
## X11    1             $0    1   621 0.1951807 0.08035309 0.17500 0.1909187
## X12    2      $1-24,999    1  7274 0.2205589 0.07756052 0.21990 0.2224778
## X14    4 $25,000-49,999    1 32192 0.2071791 0.07445022 0.20150 0.2065862
## X15    5 $50,000-74,999    1 31050 0.1903349 0.07315170 0.18000 0.1872053
## X16    6 $75,000-99,999    1 16916 0.1809260 0.07276417 0.16990 0.1768378
## X13    3      $100,000+    1 17337 0.1692426 0.07095827 0.15500 0.1635624
##            mad   min    max  range       skew   kurtosis           se
## X17 0.07857780 0.000 0.4975 0.4975  0.0546311 -0.8452869 0.0007861805
## X18 0.08562015 0.040 0.3500 0.3100 -0.6718467 -0.7072609 0.0026846524
## X11 0.07413000 0.005 0.3500 0.3450  0.4473474 -0.7212559 0.0032244586
## X12 0.10140984 0.000 0.3600 0.3600 -0.1090307 -1.0869585 0.0009093981
## X14 0.08702862 0.000 0.3600 0.3600  0.1032195 -0.9965130 0.0004149464
## X15 0.08154300 0.000 0.3600 0.3600  0.3354095 -0.8266400 0.0004151391
## X16 0.07561260 0.000 0.3600 0.3600  0.4370948 -0.7218477 0.0005594596
## X13 0.06834786 0.000 0.3600 0.3600  0.6252797 -0.4377076 0.0005389097

From the boxplot, it is as expected that the higher the income the borrower
has, the less risky the loan is. However, it is strange that the median
BorrowerRate for the group whose IncomeRange = $0 is lower than those with
IncomeRange $1-74,999.

##     item        group1 vars     n      mean         sd median   trimmed
## X11    1                  1  2255 0.1855432 0.07145271 0.1780 0.1839139
## X12    2      Employed    1 67322 0.1927906 0.07185477 0.1840 0.1907290
## X13    3     Full-time    1 26355 0.1870060 0.08154179 0.1724 0.1816353
## X14    4 Not available    1  5347 0.1914925 0.06801645 0.1900 0.1930621
## X15    5  Not employed    1   835 0.2440788 0.07710970 0.2599 0.2514864
## X16    6         Other    1  3806 0.2136962 0.07153107 0.2099 0.2152006
## X17    7     Part-time    1  1088 0.1844003 0.08013861 0.1690 0.1782154
## X18    8       Retired    1   795 0.1944420 0.08514592 0.1829 0.1906915
## X19    9 Self-employed    1  6134 0.2022686 0.07669170 0.1899 0.2010586
##            mad    min    max  range        skew   kurtosis           se
## X11 0.08821470 0.0000 0.4975 0.4975  0.29486124 -0.2447385 0.0015046845
## X12 0.08287734 0.0450 0.3600 0.3150  0.23778322 -0.9415730 0.0002769345
## X13 0.09177294 0.0000 0.3600 0.3600  0.46765034 -0.8037381 0.0005022833
## X14 0.08154300 0.0000 0.3000 0.3000 -0.06684691 -1.1089883 0.0009301625
## X15 0.08569428 0.0100 0.3500 0.3400 -0.63222656 -0.7430716 0.0026684911
## X16 0.08821470 0.0565 0.3500 0.2935 -0.05373143 -1.0644397 0.0011594722
## X17 0.08665797 0.0100 0.3500 0.3400  0.56987018 -0.6229219 0.0024295583
## X18 0.10808154 0.0500 0.3500 0.3000  0.32555855 -1.0725602 0.0030198145
## X19 0.08910426 0.0100 0.3500 0.3400  0.21130832 -1.1135670 0.0009792114

As expected, the median BorrowerRate for borrowers who are employed are
lower than those who are not employed. However, I am curious why the
median BorrowerRate for borrowers who are part-timers are lower than
those who are full-timers.

From the scatterplot, it is clear that the higher the credit score the
borrower has, the lower the BorrowerRate is required.

##     item group1 vars     n       mean         sd median    trimmed
## X11    1           1 29084 0.18325969 0.07455439 0.1700 0.17891657
## X13    3     AA    1  5372 0.07912197 0.01477932 0.0779 0.07754751
## X12    2      A    1 14551 0.11294028 0.01728524 0.1119 0.11214183
## X14    4      B    1 15581 0.15445193 0.01987918 0.1509 0.15296343
## X15    5      C    1 18345 0.19443037 0.02420360 0.1914 0.19281748
## X16    6      D    1 14274 0.24641703 0.02537442 0.2492 0.24675070
## X17    7      E    1  9795 0.29333845 0.02603741 0.2925 0.29349879
## X18    8     HR    1  6935 0.31732500 0.01889059 0.3177 0.31730036
##            mad    min    max  range       skew   kurtosis           se
## X11 0.07709520 0.0000 0.4975 0.4975  0.4981315 -0.5109810 0.0004371658
## X13 0.01037820 0.0400 0.2100 0.1700  1.3537538  4.4750141 0.0002016445
## X12 0.01779120 0.0498 0.2150 0.1652  0.4983194  0.9555580 0.0001432943
## X14 0.01630860 0.0693 0.3500 0.2807  0.8327298  1.6341363 0.0001592578
## X15 0.02283204 0.0895 0.3500 0.2605  0.6175170  1.0506435 0.0001786986
## X16 0.02698332 0.1157 0.3500 0.2343 -0.1766243  0.9695377 0.0002123847
## X17 0.03157938 0.1479 0.3600 0.2121 -0.2818067  0.6902452 0.0002630847
## X18 0.00000000 0.1779 0.3600 0.1821 -3.3437225 22.4649500 0.0002268414

This is as expected since I suspect that the credit score of the borrower
will determine the credit rating of the loan, and the worse loan rating will
require the higher interest rate.

##     item group1 vars     n      mean         sd median   trimmed
## X11    1     12    1  1614 0.1500807 0.06785817 0.1434 0.1478215
## X12    2     36    1 87778 0.1934855 0.07925234 0.1815 0.1910764
## X13    3     60    1 24545 0.1929907 0.05566590 0.1870 0.1904665
##            mad    min    max  range      skew   kurtosis           se
## X11 0.08376690 0.0400 0.2669 0.2269 0.2374626 -1.1537038 0.0016890807
## X12 0.09696204 0.0000 0.4975 0.4975 0.2586755 -1.0563516 0.0002674972
## X13 0.06004530 0.0669 0.3304 0.2635 0.3729595 -0.5508182 0.0003553102

As expected, the shorter term (12-month) loan has lower median BorrowerRate than the longer term (36 and 60 month) has, since the longer-term loans are more exposed to interest rate risks. However, I am surprised that the difference between the BorrowerRate of 36-month loan and 60-month loan is small.

##      item group1 vars     n      mean         sd  median   trimmed
## X11     1      0    1 16965 0.1815787 0.06623780 0.17590 0.1802810
## X12     2      1    1 58308 0.1882613 0.07166938 0.17740 0.1846506
## X13     3      2    1  7433 0.1981773 0.07912518 0.19050 0.1973414
## X14     4      3    1  7189 0.2005974 0.08069144 0.19000 0.1996387
## X15     5      4    1  2395 0.1806073 0.08726652 0.15750 0.1737612
## X16     6      5    1   756 0.2052496 0.09299629 0.19000 0.2031168
## X17     7      6    1  2572 0.2068046 0.08307643 0.20490 0.2074184
## X18     8      7    1 10494 0.2134143 0.08410435 0.21510 0.2152115
## X19     9      8    1   199 0.1857457 0.07343137 0.17680 0.1827484
## X110   10      9    1    85 0.1739812 0.06510994 0.16390 0.1718942
## X111   11     10    1    91 0.2259484 0.07248261 0.24490 0.2305000
## X112   12     11    1   217 0.1960535 0.07106779 0.19790 0.1955937
## X113   13     12    1    59 0.2002458 0.08765140 0.19700 0.2016980
## X114   14     13    1  1996 0.2237560 0.07267137 0.22870 0.2277389
## X115   15     14    1   876 0.1909197 0.07220494 0.18425 0.1894068
## X116   16     15    1  1522 0.2090848 0.07323964 0.20990 0.2104245
## X117   17     16    1   304 0.1975829 0.07541715 0.19080 0.1968902
## X118   18     17    1    52 0.2073250 0.07876661 0.21360 0.2098571
## X119   19     18    1   885 0.2065225 0.07221377 0.20850 0.2076891
## X120   20     19    1   768 0.2075052 0.07545702 0.20850 0.2090977
## X121   21     20    1   771 0.2065501 0.06945222 0.20850 0.2071968
##             mad    min    max  range         skew   kurtosis           se
## X11  0.07946736 0.0000 0.4975 0.4975  0.196884092 -0.7896442 0.0005085445
## X12  0.07694694 0.0000 0.3600 0.3600  0.400053492 -0.7491043 0.0002968039
## X13  0.09711030 0.0400 0.3500 0.3100  0.123992168 -1.1096628 0.0009177674
## X14  0.09799986 0.0000 0.3600 0.3600  0.158337634 -1.1379884 0.0009516850
## X15  0.07857780 0.0100 0.3600 0.3500  0.620523367 -0.6886193 0.0017831789
## X16  0.11275173 0.0100 0.3600 0.3500  0.265264207 -1.2460401 0.0033822423
## X17  0.10267005 0.0499 0.3500 0.3001 -0.006074911 -1.2288338 0.0016381073
## X18  0.10615416 0.0000 0.3600 0.3600 -0.115549955 -1.1855496 0.0008210090
## X19  0.08584254 0.0605 0.3304 0.2699  0.341296047 -1.0297029 0.0052054117
## X110 0.07413000 0.0628 0.3177 0.2549  0.278219377 -0.7904024 0.0070621642
## X111 0.09592422 0.0565 0.3304 0.2739 -0.347779888 -0.9495735 0.0075982421
## X112 0.07709520 0.0565 0.3304 0.2739  0.050226753 -0.8871763 0.0048243961
## X113 0.10689546 0.0565 0.3304 0.2739  0.009718636 -1.3009625 0.0114112409
## X114 0.09325554 0.0565 0.3304 0.2739 -0.272367419 -1.0195918 0.0016266087
## X115 0.08799231 0.0565 0.3304 0.2739  0.171303401 -1.0336329 0.0024395788
## X116 0.08984556 0.0565 0.3304 0.2739 -0.066058939 -1.0967414 0.0018773226
## X117 0.08613906 0.0608 0.3304 0.2696  0.112746351 -1.0711814 0.0043254700
## X118 0.09792573 0.0605 0.3304 0.2699 -0.179007911 -1.2021691 0.0109229631
## X119 0.07917084 0.0565 0.3304 0.2739 -0.052175433 -0.9298750 0.0024274393
## X120 0.09295902 0.0565 0.3304 0.2739 -0.047693327 -1.1002904 0.0027228208
## X121 0.08213604 0.0605 0.3304 0.2699 -0.049127896 -0.9542616 0.0025012608

Differnt categories of loans can have different BorrowerRate,
but I will not go deeper.

There is no clear correlation between number of credit lines/open revolving
accounts and BorrowerRate. However, the number of delinquencies/public records
is positively correlated with the BorrowerRate.

The size of the loan does not seem to be correlated with the BorrowerRate.

## 
## Call:
## lm(formula = BorrowerRate ~ StatedMonthlyIncome, data = loan_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.19740 -0.05870 -0.00920  0.05823  1.67782 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.978e-01  2.760e-04  716.62   <2e-16 ***
## StatedMonthlyIncome -8.902e-07  2.952e-08  -30.16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07452 on 113935 degrees of freedom
## Multiple R-squared:  0.007918,   Adjusted R-squared:  0.007909 
## F-statistic: 909.3 on 1 and 113935 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = BorrowerRate ~ CreditScoreRangeLower, data = loan_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.49873 -0.05051 -0.01165  0.04585  0.21868 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            5.487e-01  2.041e-03   268.9   <2e-16 ***
## CreditScoreRangeLower -5.190e-04  2.963e-06  -175.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0663 on 113344 degrees of freedom
##   (591 observations deleted due to missingness)
## Multiple R-squared:  0.213,  Adjusted R-squared:  0.213 
## F-statistic: 3.068e+04 on 1 and 113344 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = BorrowerRate ~ EmploymentStatus, data = loan_data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.234079 -0.058791 -0.008791  0.057809  0.311957 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    0.185543   0.001569 118.270  < 2e-16 ***
## EmploymentStatusEmployed       0.007247   0.001595   4.544 5.52e-06 ***
## EmploymentStatusFull-time      0.001463   0.001635   0.895  0.37082    
## EmploymentStatusNot available  0.005949   0.001871   3.180  0.00147 ** 
## EmploymentStatusNot employed   0.058536   0.003018  19.396  < 2e-16 ***
## EmploymentStatusOther          0.028153   0.001980  14.221  < 2e-16 ***
## EmploymentStatusPart-time     -0.001143   0.002750  -0.416  0.67770    
## EmploymentStatusRetired        0.008899   0.003073   2.896  0.00378 ** 
## EmploymentStatusSelf-employed  0.016725   0.001835   9.116  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0745 on 113928 degrees of freedom
## Multiple R-squared:  0.008622,   Adjusted R-squared:  0.008552 
## F-statistic: 123.9 on 8 and 113928 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = BorrowerRate ~ CurrentDelinquencies, data = loan_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.47348 -0.05815 -0.00904  0.05806  0.17106 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          0.1889351  0.0002282  827.96   <2e-16 ***
## CurrentDelinquencies 0.0066680  0.0001105   60.35   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07357 on 113238 degrees of freedom
##   (697 observations deleted due to missingness)
## Multiple R-squared:  0.03116,    Adjusted R-squared:  0.03115 
## F-statistic:  3642 on 1 and 113238 DF,  p-value: < 2.2e-16

The R-Squared values show that it is not sufficient to explain the borrower
rate with just one variable.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

The BorrowerRate is positively and strongly correlated with the Prosper Credit Rating. It is likely that the rate ranges are determined by the loan credit rating.

Other features that are positively, though less strongly correlated with BorrowerRate, include staetd monthly income, borrower’s credit score, number of delinquents, and number of public records in the past 10 years.

The features that do not seem to be correlated with BorrowerRate are number of open credit lines, number of open revolving accounts, and size of the loan.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)? The interest rate also depends on the

term of the loan. Shorter term loans have lower required interest rates than longer term loans do. Another interesting observation is that the median interest rate of the loan category 10 (Cosmetic Operation) is significantly higher than that of loan category 4 (Personal Loan), 25% vs 15%.

What was the strongest relationship you found?

The BorrowerRate and the Prosper Credit Rating of the listing are positively and strongly correlated. This is expected as the rating is classified by its riskiness. The more interesting question is how do we determine the riskiness of each listing.

Multivariate Plots Section

The Prosper settlement records do not seem to be correlated to the BorrowerRate.

Short term listings (12-month) clearly have lower BorrowRate. Also, as the
credit score increases, the BorrowRate for short-term listings seem to decrease.
However, for the longer term listings (36-month and 60-month), the variance is
high. It is not clear which one has lower BorrowRate, and whether credit
score matters.

This plot is crowded by Employed status. It does not tell anything much.

BorrowerRate are clearly layered by ProsperRating..Alpha. The BorrowerRate
also has moderate negative correlation with the credit score of the borrower.

Shorter term listings clearly have lower BorrowerRate. The rates are further
layered by ProsperRating..Alpha. We can also see moderate negative correlation
between BorrowerRate and CreditScore.

## 
## Calls:
## m1: lm(formula = I(BorrowerRate) ~ CreditScoreRangeLower, data = subset(loan_data, 
##     , as.character(ProsperScore) != ""))
## m2: lm(formula = I(BorrowerRate) ~ CreditScoreRangeLower + StatedMonthlyIncome, 
##     data = subset(loan_data, , as.character(ProsperScore) != 
##         ""))
## m3: lm(formula = I(BorrowerRate) ~ CreditScoreRangeLower + StatedMonthlyIncome + 
##     CurrentDelinquencies, data = subset(loan_data, , as.character(ProsperScore) != 
##     ""))
## m4: lm(formula = I(BorrowerRate) ~ CreditScoreRangeLower + StatedMonthlyIncome + 
##     CurrentDelinquencies + ProsperRating..Alpha., data = subset(loan_data, 
##     , as.character(ProsperScore) != ""))
## m5: lm(formula = I(BorrowerRate) ~ CreditScoreRangeLower + StatedMonthlyIncome + 
##     CurrentDelinquencies + ProsperRating..Alpha. + Term, data = subset(loan_data, 
##     , as.character(ProsperScore) != ""))
## m6: lm(formula = I(BorrowerRate) ~ CreditScoreRangeLower + StatedMonthlyIncome + 
##     CurrentDelinquencies + ProsperRating..Alpha. + Term + EmploymentStatus, 
##     data = subset(loan_data, , as.character(ProsperScore) != 
##         ""))
## 
## ===================================================================================================================================
##                                          m1              m2              m3              m4              m5              m6        
## -----------------------------------------------------------------------------------------------------------------------------------
##   (Intercept)                            0.549***        0.548***        0.567***        0.381***        0.362***        0.358***  
##                                         (0.002)         (0.002)         (0.002)         (0.001)         (0.001)         (0.002)    
##   CreditScoreRangeLower                 -0.001***       -0.001***       -0.001***       -0.000***       -0.000***       -0.000***  
##                                         (0.000)         (0.000)         (0.000)         (0.000)         (0.000)         (0.000)    
##   StatedMonthlyIncome                                   -0.000***       -0.000***       -0.000***       -0.000***       -0.000**   
##                                                         (0.000)         (0.000)         (0.000)         (0.000)         (0.000)    
##   CurrentDelinquencies                                                   0.000           0.003***        0.003***        0.003***  
##                                                                         (0.000)         (0.000)         (0.000)         (0.000)    
##   ProsperRating..Alpha.: A                                                              -0.041***       -0.044***       -0.042***  
##                                                                                         (0.000)         (0.000)         (0.001)    
##   ProsperRating..Alpha.: AA                                                             -0.061***       -0.062***       -0.059***  
##                                                                                         (0.001)         (0.001)         (0.001)    
##   ProsperRating..Alpha.: B                                                              -0.007***       -0.012***       -0.010***  
##                                                                                         (0.000)         (0.000)         (0.001)    
##   ProsperRating..Alpha.: C                                                               0.028***        0.021***        0.023***  
##                                                                                         (0.000)         (0.000)         (0.001)    
##   ProsperRating..Alpha.: D                                                               0.076***        0.072***        0.074***  
##                                                                                         (0.000)         (0.000)         (0.001)    
##   ProsperRating..Alpha.: E                                                               0.117***        0.115***        0.116***  
##                                                                                         (0.000)         (0.000)         (0.001)    
##   ProsperRating..Alpha.: HR                                                              0.146***        0.146***        0.147***  
##                                                                                         (0.001)         (0.000)         (0.001)    
##   Term                                                                                                   0.001***        0.001***  
##                                                                                                         (0.000)         (0.000)    
##   EmploymentStatus: Employed                                                                                             0.007***  
##                                                                                                                         (0.001)    
##   EmploymentStatus: Full-time                                                                                            0.012***  
##                                                                                                                         (0.001)    
##   EmploymentStatus: Not available                                                                                       -0.000     
##                                                                                                                         (0.001)    
##   EmploymentStatus: Not employed                                                                                         0.022***  
##                                                                                                                         (0.002)    
##   EmploymentStatus: Other                                                                                                0.007***  
##                                                                                                                         (0.001)    
##   EmploymentStatus: Part-time                                                                                            0.006***  
##                                                                                                                         (0.001)    
##   EmploymentStatus: Retired                                                                                              0.013***  
##                                                                                                                         (0.002)    
##   EmploymentStatus: Self-employed                                                                                        0.012***  
##                                                                                                                         (0.001)    
## -----------------------------------------------------------------------------------------------------------------------------------
##   R-squared                              0.213           0.215           0.225           0.755           0.762           0.764     
##   adj. R-squared                         0.213           0.215           0.225           0.755           0.762           0.764     
##   sigma                                  0.066           0.066           0.066           0.037           0.036           0.036     
##   F                                  30684.346       15478.726       10984.621       34875.500       32937.420       19264.465     
##   p                                      0.000           0.000           0.000           0.000           0.000           0.000     
##   Log-likelihood                    146748.641      146856.110      147496.000      212647.834      214285.781      214729.232     
##   Deviance                             498.165         497.221         489.984         155.040         150.620         149.444     
##   AIC                              -293491.283     -293704.220     -294982.000     -425271.667     -428545.563     -429416.464     
##   BIC                              -293462.368     -293665.667     -294933.814     -425156.020     -428420.278     -429214.081     
##   N                                 113346          113346          113240          113240          113240          113240         
## ===================================================================================================================================

The variables in this linear model can account for 76.4% of the variance in the BorrowerRate of the loan listings. The most important factor is the credit rating of the loan, followed by credit score of the borrower. Other variables poorly contribute to the prediction.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

ProsperRating..Alpha clearly determine the ranges of the BorrowRate. The BorrowerRate also has moderate negative correlation with the credit score of the borrower. The stated monthly income has weaker correlation to the borrower rate.

The 12-month clearly has lower borrower rate than those of 36-month and 60-month loans.

Were there any interesting or surprising interactions between features?

Yes, I am surprised that the monthly stated income does not matter much in determining the borrower rates. Also, I would have thought that the 60-month loans would have higher interest rates than those of the 36-month loans since the former are more exposed to the interest rate change risk.

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.

Yes, I created models to predict a loan’s borrower rate from borrower’s credit score, stated monthly income, Prosper credit rating of the listing, loan term, and borrower’s employment status. These variables can account for 76.4% of the variance in the BorrowerRate. However, I still cannot find determining factors for credit rating of the loan listings.


Final Plots and Summary

Plot One

Description One

The loans are categorized by its riskiness from low risk to high risk (AA -> HR). The riskier the loans are, the higher interest rates they require.

Plot Two

Description Two

Shorter term listings have lower BorrowerRate. The rates are further layered
by ProsperRating..Alpha. We can also see moderate negative correlation between
BorrowerRate and CreditScore.

Plot Three

Description Three

In comparison with Plot2, the positive correlation between borrower rate and
stated monthly income is weaker than that between borrower rate and borrower’s
credit score although one would expect that borrower’s income should play a
critical role in determining the riskness of a loan.


Reflection

The loan data set has 113,937 records across 81 variables. There is a clear gap during pre-2009 and post-2009, possibly due to the Financial Crisis in late 2008. The focus of this EDA is on the loan listed post-2009.

My objective is to determine what determines the interest rate a borrower has to pay for his loan. I suspect that the factors would include things like his credit score, his past settlement history, amount of debts he hold, his income, his employment, and loan term.

Firstly, I examined many variables, plotting out their histograms and see their staitstics. Next I explored the relationships between two variables. As expected, there is a strong positive correlation between interest rate and the credit rating of a loan, but my main question is what determines which credit rating a loan will receive. The scatterplot between interest rate and borrower’s credit score shows a moderate positive correlation. However, other variables such as number of open credit lines, number of open revolving accounts, stated monthly income, and loan size do not show clear correlation.

When I used facet_wrap to see whether the loan term affects interest rates, I found that the 12-month loan has lower interest rate as expected. However, I am surprised to find that there is almost no difference in the median interest rates of 36-month and 60-month loans.

When I tried to find a linear model to predict a borrower’s interest rate, only credit rating of the loan and borrower’s credit score significantly contribute to the R-squared value. Other factors’ contributions are negligible.

For future exploration, I may have to include things like loan categories, occupations, and collaterals in order to better determine the required interest rates.